Adobe's PDF Reference, told us a PDF file through the following four areas to understand: 1. The object of a PDF document is the basic data from a data structure types. 2. Document (physical structure), how decisions are stored in a PDF document and how they are visited, how to be updated. This structure is independent of the object semantics. 3. The document structure, description of some of the basic object type is how to express the composition of PDF document: pages, fonts, annotations, and other content. 4. The content stream. The contents of a PDF file stream contains a series of instructions, described the appearance of the page or other graphical entities look and content of the document. But at that time for me to read these lines will have great difficulties, and need to know exactly what it means to be behind dozens of reading hundreds of pages of content page and to analyze an actual PDF document to be completely understand what it means. Later, after a long period of reading the documents, related development, and specific analysis of PDF documents to PDF files after syntax, document analysis to figure out. Although learning is pain, but then I really would like to have a person can tell me a simple example, through a simple example to describe the basic composition of PDF, its principles and process analysis. So I will be following a simple example to illustrate the main features of PDF and gives a simple view PDF documents. While continuing to read the article, we first ask ourselves the following few questions: l Are you aware of at least one file format it? (Eg HTML) l Why should learning knowledge PDF? If you answer to the first question is "yes", and the second question you can give a very clear answer, then this essay is for you. Otherwise, if the format is not any kind of understanding, it is recommended to look at HTML, or XML, you can obtain from the two languages, many inspired by the composition of PDF of the study have been of great benefit; If you are not sure you want to Learning is for what, then I think you are not the purpose of learning and motivation, maybe after you learn today forgotten tomorrow. 1. PDF format and HTML, XML format: A PDF document is essentially an 8-byte sequence. In fact, PDF format and we have well-known HTML, XML and other structured document formats, the contains keyword, delimiter, data and so on. The difference is that PDF documents are in accordance with the stored binary stream, and the html file is saved in text mode. XML documents generally contain only the data itself, and not information on how to display them, so to display an XML document also requires a Schema file to show that otherwise would be to see all of the byte stream; HTML contains data At the same time, also contains information on how to display, but the HTML is a text-based storage is readable, you will be able to open an HTML document to know all the shows in the browser in a text. HTML can not contain the other is a binary stream, its image files through the links, and all manner of external documents to be achieved. 2. PDF regulate the development of PDF specification from 1993 to now, there have been seven versions, six versions of the upgrade, from the initial to the present pdf1.0.6 version PDF1.6, each upgrade will be adding some new features, PDF Reference Manual is from the initial to the present more than 100 pages of more than 1000 pages, but the PDF file format did not change the main characteristics of it can be understood, PDF1.6 is set to expand the PDF1.0, PDF1.0 after learning to basically understand the the content of PDF1.6. Therefore say that I am following the example of PDF1.0 is based on a simple analysis of a PDF document. PDF upgrade the development of norms: 1.1 in 1995 to join the document encryption (40 bytes), lead tree, tree names, links, device-independent color resources. Form 1.2 in 1996, halftone screen, and other high-level color characteristics of Chinese, Japanese and Korean support 1.3 2000, digital signature, logical structure, JavaScript, embedded documents, Masked Images, smooth shadow to support the additional CID font color. 1.4 2001 file encryption (128 bytes), tabbed PDF, access control, transparency, meta-data stream 1.5 2003 document encryption (public key), JPEG 2000 compression, optional content group the type of additional comments 1.6 2005 file encryption (AES), to increase the maximum file support, adding 3D support for additional types of comments
3. Fundamental component PDF files: A PDF file from the big four in terms of sub-parts: l file header, specified by the document comply with the PDF specification version number, which appears in the first line of PDF documents. l document body, PDF files of the main part of a series of objects. l cross-reference table, in order to be able to carry out random access indirect object and an indirect object the establishment of the index table address. l file tail, the statement of the cross-reference table of the address, which means to understand the root document object body (Catalog), in order to be able to find all PDF documents the location of the object body to achieve random access. In addition, PDF files preserve the security of the encrypted information (discussed in detail later). The following chart:
Figure 1 |